Introduction

library(GO.db)
library(graph)
library(dnet)
library(magrittr)
library(tidyverse)
library(pander)
library(scales)
library(plotly)

The aim of this page is to keep an up-to-date summary of the Gene Ontology database, as represented in the R package GO.db. The information contained for each GO term is:

The main object created is available for download here and this will load as a tibble after using the function read_rds() (fro the package readr)

Create the Basic Graphs

The package AnnotationDbi contains the function makeGOGraph which returns a graphNEL graph. In the following, we will:

  1. Create a graph for each ontology and remove the node "all" as this is essentially redundant
  2. Reverse the direction of the DAG for compatibility with tools in the package dnet. Whilst hosted on CRAN, this package Depends on the bioconductor package supraHex and should be installed using BiocManager::install("dnet")

Note that the following code can take several minutes to run as these are large graphs.

graphs <- c(BP = "bp", CC = "cc", MF = "mf") %>%
    lapply(makeGOGraph) %>%
    lapply(function(x){removeNode("all", x)}) %>%
    lapply(dDAGreverse)
Summary of graph sizes for each ontology
Ontology Nodes Edges
BP 29,699 71,745
CC 4,202 7,534
MF 11,148 13,681

Find the Key Information

goSummaries <- lapply(graphs, function(x){
    lng <- dDAGlevel(x, "longest_path") - 1
    shrt <- dDAGlevel(x, "shortest_path") - 1
    tips <- dDAGtip(x)
    tibble(
        id = unique(c(names(lng), names(shrt))),
        shortest_path = shrt,
        longest_path = lng,
        terminal_node = id %in% tips
        )
}) %>%
    bind_rows() %>%
    mutate(ontology = Ontology(id))
Path lengths for each ontology, based on whether a term is a terminal node or not.

Path lengths for each ontology, based on whether a term is a terminal node or not.

Cumulative number of GO Terms with paths $\geq$ x.

Cumulative number of GO Terms with paths \(\geq\) x.

Examples

In reality, we can just add this table to our GO analysis table from tools like goana() and use it to filter results before adjusting p-values.

As an example of thow to use this to assist our decision making, if we chose to remove GO terms with a shortest path \(\leq\) 4, we can see how many terms we would keep and retain.

ggplotly(goSummaries %>%
    mutate(keep = shortest_path > 4) %>%
    ggplot(aes(keep, fill = terminal_node)) +
    geom_bar() +
    facet_wrap(~ontology, nrow = 1) +
    scale_y_continuous(labels = comma) +
    labs(x = "Term Retained", y = "Total") +
    theme_bw())
goSummaries %>%
    mutate(keep = shortest_path > 4) %>%
    group_by(ontology, terminal_node, keep) %>%
    tally() %>%
    spread(key = keep, value = n) %>%
    rename(Discard = `FALSE`,
           Retain = `TRUE`) %>%
    bind_rows(
        tibble(ontology = "**Total**",
               Discard = sum(.$Discard),
               Retain = sum(.$Retain))) %>%
    pander(big.mark = ",",
           justify = "llrr")
ontology terminal_node Discard Retain
BP FALSE 8,149 9,346
BP TRUE 4,200 8,004
CC FALSE 1,178 280
CC TRUE 2,093 651
MF FALSE 987 1,084
MF TRUE 2,142 6,935
Total NA 18,749 26,300

Alternatively, we could remove GO terms with a longest path back to the root node is \(\leq 5\) steps.

ggplotly(goSummaries %>%
    mutate(keep = longest_path > 5) %>%
    ggplot(aes(keep, fill = terminal_node)) +
    geom_bar() +
    facet_wrap(~ontology, nrow = 1) +
    scale_y_continuous(labels = comma) +
    labs(x = "Term Retained", y = "Total") +
    theme_bw())
goSummaries %>%
    mutate(keep = longest_path > 5) %>%
    group_by(ontology, terminal_node, keep) %>%
    tally() %>%
    spread(key = keep, value = n) %>%
    rename(Discard = `FALSE`,
           Retain = `TRUE`) %>%
    bind_rows(
        tibble(ontology = "**Total**",
               Discard = sum(.$Discard),
               Retain = sum(.$Retain))) %>%
    pander(big.mark = ",",
           justify = "llrr")
ontology terminal_node Discard Retain
BP FALSE 3,400 14,095
BP TRUE 889 11,315
CC FALSE 407 1,051
CC TRUE 562 2,182
MF FALSE 1,227 844
MF TRUE 5,837 3,240
Total NA 12,322 32,727

The summaries obtained above can be downloaded from here. Place this in the appropriate folder and the object can then be imported using read_rds("path/to/goSummaries.RDS")

Session Info

R version 3.5.2 (2018-12-20)

**Platform:** x86_64-pc-linux-gnu (64-bit)

locale: LC_CTYPE=en_AU.UTF-8, LC_NUMERIC=C, LC_TIME=en_AU.UTF-8, LC_COLLATE=en_AU.UTF-8, LC_MONETARY=en_AU.UTF-8, LC_MESSAGES=en_AU.UTF-8, LC_PAPER=en_AU.UTF-8, LC_NAME=C, LC_ADDRESS=C, LC_TELEPHONE=C, LC_MEASUREMENT=en_AU.UTF-8 and LC_IDENTIFICATION=C

attached base packages: parallel, stats4, stats, graphics, grDevices, utils, datasets, methods and base

other attached packages: bindrcpp(v.0.2.2), plotly(v.4.8.0), scales(v.1.0.0), pander(v.0.6.3), forcats(v.0.3.0), stringr(v.1.3.1), dplyr(v.0.7.8), purrr(v.0.2.5), readr(v.1.3.1), tidyr(v.0.8.2), tibble(v.2.0.0), ggplot2(v.3.1.0), tidyverse(v.1.2.1), magrittr(v.1.5), dnet(v.1.1.4), supraHex(v.1.20.0), hexbin(v.1.27.2), igraph(v.1.2.2), graph(v.1.60.0), GO.db(v.3.7.0), AnnotationDbi(v.1.44.0), IRanges(v.2.16.0), S4Vectors(v.0.20.1), Biobase(v.2.42.0) and BiocGenerics(v.0.28.0)

loaded via a namespace (and not attached): nlme(v.3.1-137), lubridate(v.1.7.4), bit64(v.0.9-7), httr(v.1.4.0), Rgraphviz(v.2.26.0), tools(v.3.5.2), backports(v.1.1.3), R6(v.2.3.0), DBI(v.1.0.0), lazyeval(v.0.2.1), colorspace(v.1.3-2), withr(v.2.1.2), tidyselect(v.0.2.5), bit(v.1.1-14), compiler(v.3.5.2), cli(v.1.0.1), rvest(v.0.3.2), Cairo(v.1.5-9), xml2(v.1.2.0), labeling(v.0.3), digest(v.0.6.18), rmarkdown(v.1.11), pkgconfig(v.2.0.2), htmltools(v.0.3.6), highr(v.0.7), htmlwidgets(v.1.3), rlang(v.0.3.0.1), readxl(v.1.2.0), rstudioapi(v.0.8), RSQLite(v.2.1.1), shiny(v.1.2.0), bindr(v.0.1.1), generics(v.0.0.2), jsonlite(v.1.6), crosstalk(v.1.0.0), Matrix(v.1.2-15), Rcpp(v.1.0.0), munsell(v.0.5.0), ape(v.5.2), stringi(v.1.2.4), yaml(v.2.2.0), MASS(v.7.3-51.1), plyr(v.1.8.4), grid(v.3.5.2), blob(v.1.1.1), promises(v.1.0.1), crayon(v.1.3.4), lattice(v.0.20-38), haven(v.2.0.0), hms(v.0.4.2), knitr(v.1.21), pillar(v.1.3.1), reshape2(v.1.4.3), glue(v.1.3.0), evaluate(v.0.12), data.table(v.1.11.8), modelr(v.0.1.2), httpuv(v.1.4.5.1), cellranger(v.1.1.0), gtable(v.0.2.0), assertthat(v.0.2.0), xfun(v.0.4), mime(v.0.6), xtable(v.1.8-3), broom(v.0.5.1), later(v.0.7.5), viridisLite(v.0.3.0) and memoise(v.1.1.0)